Logical reasoning of text is an important ability that requires understanding the information present in the text, their interconnections, and then reasoning through them to infer new conclusions. Prior works on improving the logical reasoning ability of language models require complex processing of training data (e.g., aligning symbolic knowledge to text), yielding task-specific data augmentation solutions that restrict the learning of general logical reasoning skills. In this work, we propose APOLLO, an adaptively pretrained language model that has improved logical reasoning abilities. We select a subset of Wikipedia, based on a set of logical inference keywords, for continued pretraining of a language model. We use two self-supervised loss functions: a modified masked language modeling loss where only specific parts-of-speech words, that would likely require more reasoning than basic language understanding, are masked, and a sentence-level classification loss that teaches the model to distinguish between entailment and contradiction types of sentences. The proposed training paradigm is both simple and independent of task formats. We demonstrate the effectiveness of APOLLO by comparing it with prior baselines on two logical reasoning datasets. APOLLO performs comparably on ReClor and outperforms baselines on LogiQA.
translated by 谷歌翻译
We propose Universal Document Processing (UDOP), a foundation Document AI model which unifies text, image, and layout modalities together with varied task formats, including document understanding and generation. UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation. With a novel Vision-Text-Layout Transformer, UDOP unifies pretraining and multi-domain downstream tasks into a prompt-based sequence generation scheme. UDOP is pretrained on both large-scale unlabeled document corpora using innovative self-supervised objectives and diverse labeled data. UDOP also learns to generate document images from text and layout modalities via masked image reconstruction. To the best of our knowledge, this is the first time in the field of document AI that one model simultaneously achieves high-quality neural document editing and content customization. Our method sets the state-of-the-art on 9 Document AI tasks, e.g., document understanding and QA, across diverse data domains like finance reports, academic papers, and websites. UDOP ranks first on the leaderboard of the Document Understanding Benchmark (DUE).
translated by 谷歌翻译
The diverse demands of different summarization tasks and their high annotation costs are driving a need for few-shot summarization. However, despite the emergence of many summarization tasks and datasets, the current training paradigm for few-shot summarization systems ignores potentially shareable knowledge in heterogeneous datasets. To this end, we propose \textsc{UniSumm}, a unified few-shot summarization model pre-trained with multiple summarization tasks and can be prefix-tuned to excel at any few-shot summarization datasets. Meanwhile, to better evaluate few-shot summarization systems, under the principles of diversity and robustness, we assemble and publicize a new benchmark \textsc{SummZoo}. It consists of $8$ diverse summarization tasks with multiple sets of few-shot samples for each task, covering both monologue and dialogue domains. Experimental results and ablation studies show that \textsc{UniSumm} outperforms strong baseline systems by a large margin across all tasks in \textsc{SummZoo} under both automatic and human evaluations. We release our code and benchmark at \url{https://github.com/microsoft/UniSumm}.
translated by 谷歌翻译
Controllable summarization allows users to generate customized summaries with specified attributes. However, due to the lack of designated annotations of controlled summaries, existing works have to craft pseudo datasets by adapting generic summarization benchmarks. Furthermore, most research focuses on controlling single attributes individually (e.g., a short summary or a highly abstractive summary) rather than controlling a mix of attributes together (e.g., a short and highly abstractive summary). In this paper, we propose MACSum, the first human-annotated summarization dataset for controlling mixed attributes. It contains source texts from two domains, news articles and dialogues, with human-annotated summaries controlled by five designed attributes (Length, Extractiveness, Specificity, Topic, and Speaker). We propose two simple and effective parameter-efficient approaches for the new task of mixed controllable summarization based on hard prompt tuning and soft prefix tuning. Results and analysis demonstrate that hard prompt models yield the best performance on all metrics and human evaluations. However, mixed-attribute control is still challenging for summarization tasks. Our dataset and code are available at https://github.com/psunlpgroup/MACSum.
translated by 谷歌翻译
Contrastive Learning has recently achieved state-of-the-art performance in a wide range of tasks. Many contrastive learning approaches use mined hard negatives to make batches more informative during training but these approaches are inefficient as they increase epoch length proportional to the number of mined negatives and require frequent updates of nearest neighbor indices or mining from recent batches. In this work, we provide an alternative to hard negative mining in supervised contrastive learning, Tail Batch Sampling (TBS), an efficient approximation to the batch assignment problem that upper bounds the gap between the global and training losses, $\mathcal{L}^{Global} - \mathcal{L}^{Train}$. TBS \textbf{improves state-of-the-art performance} in sentence embedding (+0.37 Spearman) and code-search tasks (+2.2\% MRR), is easy to implement - requiring only a few additional lines of code, does not maintain external data structures such as nearest neighbor indices, is more computationally efficient when compared to the most minimal hard negative mining approaches, and makes no changes to the model being trained.
translated by 谷歌翻译
图像注册可用于量化前列腺癌患者纵向MR图像的形态变化。本文描述了改善基于学习的注册算法的发展,对于这种挑战性的临床应用程序通常具有高度可变但有限的培训数据。首先,我们报告说,潜在空间可以聚集到一个比在经过训练的注册网络深层瓶颈特征的瓶颈特征中通常发现的尺寸空间要低得多。基于此观察结果,我们提出了一种层次量化方法,使用具有约束大小的共同训练的词典来离散学习的特征向量,以改善注册网络的概括。此外,在潜在的量化空间中,独立优化了一种新颖的协作词典,以合并其他先验信息,例如对腺体或其他感兴趣的区域的分割。根据来自86名前列腺癌患者的216张真实临床图像,我们显示了这两个组件的功效。从腺体上的骰子和相应地标的目标登记误差方面,获得了统计学意义的提高注册精度,后者的实现了5.46毫米,而没有量化的基线提高了28.7 \%。实验结果还表明,在训练数据和测试数据之间,性能的差异确实被最小化了。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
耐药性是对全球健康的重大威胁,以及整个疾病和药物发育的临床治疗中的重要疑虑。与药物结合有关的蛋白质中的突变是适应性耐药性的常见原因。因此,对突变如何影响药物和靶蛋白之间的相互作用的定量估计对于药物开发和临床实践来说是至关重要的。已经证明,依赖于分子动力学模拟,Rosetta方案以及机器学习方法的计算方法能够预测对蛋白质突变的配体亲和力变化。然而,严重限制的样本量和重质噪声诱导的过烧和泛化问题已经很广泛地采用了用于研究耐药性的机器学习。在本文中,我们提出了一种稳健的机器学习方法,称为Spldextratees,其可以准确地预测蛋白质突变并鉴定引起抗性突变的配体结合亲和力。特别是,所提出的方法按照易于学习的样本开始的特定方案级别,逐渐融入训练中的特定方案,然后在训练中迭代,然后在样本权重再验计算和模型更新之间迭代。此外,我们计算了基于物理的基于物理的结构特征,为机器学习模型提供了对这种数据有限预测任务的蛋白质的有价值的域知识。该实验证实了提出的方法在三种情况下预测激酶抑制剂抗性的方法,并实现了与分子动力学和Rosetta方法相当的预测准确性,具有较少的计算成本。
translated by 谷歌翻译
背景:电子健康记录(EHRS)包含丰富的患者健康历史信息,这通常包括结构化和非结构化数据。已经有许多研究专注于从结构化数据中蒸馏有价值的信息,例如疾病代码,实验室测试结果和治疗方法。但是,依托结构化数据可能不足反映患者的综合信息,此类数据可能偶尔含有错误的记录。目的:随着机器学习(ML)和深度学习(DL)技术的最近进步,越来越多的研究通过纳入非结构化的自由文本数据,寻求获得更准确的结果。本文评论了使用多模式数据的研究,即结构化和非结构化数据的组合,从EHRS作为传统ML或DL模型的输入来解决目标任务。材料和方法:我们在电气和电子工程师(IEEE)数字图书馆(IEEE)数字图书馆,PubMed和Compution Machion(ACM)数字文章中搜索了与基于ML的多模式EHR研究相关的制品。结果与讨论:最后94项包括研究,我们专注于如何使用常规ML和DL技术合并和互动的数据来自不同方式的数据,以及如何在与EHR相关的任务中应用这些算法。此外,我们研究了这些融合方法的优点和局限,并表明了基于ML的多模式EHR研究的未来方向。
translated by 谷歌翻译
少量学习,特别是几秒钟的图像分类,近年来受到了越来越多的关注,并目睹了重大进展。最近的一些研究暗示表明,许多通用技术或“诀窍”,如数据增强,预训练,知识蒸馏和自我监督,可能大大提高了几次学习方法的性能。此外,不同的作品可以采用不同的软件平台,不同的训练计划,不同的骨干架构以及甚至不同的输入图像大小,使得公平的比较困难,从业者与再现性斗争。为了解决这些情况,通过在Pytorch中的同一单个代码库中重新实施17个最新的框架,提出了几次射门学习(Libfewshot)的全面图书馆。此外,基于libfewshot,我们提供多个基准数据集的全面评估,其中包含多个骨干架构,以评估不同培训技巧的常见缺陷和效果。此外,鉴于近期对必要性或未培训机制的必要性怀疑,我们的评估结果表明,特别是当与预训练相结合时,仍然需要这种机制。我们希望我们的工作不仅可以降低初学者的障碍,可以在几次学习上工作,而且还消除了非动力技巧的影响,促进了几枪学习的内在研究。源代码可从https://github.com/rl-vig/libfewshot获取。
translated by 谷歌翻译